A Fuzzy Clustering Approach for Missing Value Imputation with Non-Parameter Outlier Test
نویسندگان
چکیده
Missing value is a challenging issue in data mining, as information deficiency negatively affects both data quality and reliability. This paper focuses on an algorithm of a fuzzy clustering approach for missing value imputation with noisy data immunity. The PCFKMI (Pre-Clustering based Fuzzy K-Means Imputation) method aggregates data instances to more accurate clusters for further appropriate estimation via information entropy after resampling pre-clustering and outlier test. Experimental results demonstrate that the PCFKMI proposed obtains higher precision both on quantitative and on nominal attributive missing value completion than other classic methods under all missingness mechanisms at varying missing rates with abnormal values.
منابع مشابه
Missing data imputation in multivariable time series data
Multivariate time series data are found in a variety of fields such as bioinformatics, biology, genetics, astronomy, geography and finance. Many time series datasets contain missing data. Multivariate time series missing data imputation is a challenging topic and needs to be carefully considered before learning or predicting time series. Frequent researches have been done on the use of diffe...
متن کاملA classifier ensemble approach for the missing feature problem
OBJECTIVES Many classification problems must deal with data that contains missing values. In such cases data imputation is critical. This paper evaluates the performance of several statistical and machine learning imputation methods, including our novel multiple imputation ensemble approach, using different datasets. MATERIALS AND METHODS Several state-of-the-art approaches are compared using...
متن کاملDensity-based Imputation Method for Fuzzy Cluster Analysis of Gene Expression Microarray Data
Fuzzy clustering has been widely used for analysis of gene expression microarray data. However, most fuzzy clustering algorithms require complete datasets and, because of technical limitations, most microarray datasets have missing values. To address this problem, we present a new algorithm where genes are clustered using the Fuzzy C-Means algorithm (FCM). The fuzzy partition obtained is then u...
متن کاملMachine Learning Based Missing Value Imputation Method for Clinical Dataset
Missing value imputation is one of the biggest tasks of data pre-processing when performing data mining. Most medical datasets are usually incomplete. Simply removing the cases from the original datasets can bring more problems than solutions. A suitable method for missing value imputation can help to produce good quality datasets for better analysing clinical trials. In this paper we explore t...
متن کاملFuzzy Unordered Rules Induction Algorithm Used as Missing Value Imputation Methods for K-Mean Clustering on Real Cardiovascular Data
Missing value imputation is one of the biggest tasks of data pre-processing when performing data mining. Most medical datasets are usually incomplete. Simply removing the cases from the original datasets can bring more problems than solutions. A suitable method for missing value imputation can help to produce good quality datasets for better analysing clinical trials. In this paper we explore t...
متن کامل